16.1 The Challenge of Unstructured ModelingΒΆ

The goal of deep learning: scale ML to kinds of challenges needed to solve artificial intelligence. This means being able to understand high-D data with rich structure.

The process of classification discards most of the information in the input and produces a single output. The classifier is also often able to ignore many parts of the input. It is possible to ask probablistic models to do many other tasks. Most require a complete understanding of the entire structure of the input, with no option to ignore sections of it.

  • Density Estimation: Given an input x, the ML system returns an estimation of the true density p(x) under the data generating distribution.
  • Denoising: Given a damaged or incorrectly observed input \(\tilde{x}\), the ML system returns an estimate of the original or correct x.
  • Missing value imputation: Given the observations of some element of x, the model is asked to return estimates of or a probablistic distribution over some or all of the unobserved elements of x.
  • Sampling: The model generates new samples from the distribution p(x)

In general, if we wish to model a distribution over a random vector x containing n discrete variables capable of taking k values each, then the naive approach of respresenting P(x) by storing a lookup table with one probability value per possible outcome require \(k^n\) parameters. This is not feasible because:

  • Memory: the cost of storing the representations
  • Statistical efficiency: as the number of parameters in a model increase so does the amount of training data needed to choose the values those parameters using a statistical estimators.
  • Runtime-the cost of inference: Suppose we want to perform an inference to compute \(P(x_1)\) or \(P(x_1|x_2)\). Computing these distribution will require summing accross the entire table.
  • Runtime-the cost of sampling: Suppose we want to draw a sample from the model. Naive way: sample some value \(u \sim U(0, 1)\) then iterate through the table, adding up the probability values until they exceed u and return the outcome corresponding to that position in the table. This requires reading through the whole table in the worst case, so it has the same exponential cost as the older operations.

Problem with table based approach: we are explicitly modeling every possible kind of interaction between every possible subset of variables. Real task we encounters; much simpler than this.

Structured probablistic models provide a formal framework for modeling only direct interactions between random variables. This allow the model to have

  • significant fewer parameters
  • therefor be estimated reliably from less data
  • dramastically reduced computational cost in terms of storing the model, performing inference in the model and drawing samples from the model.